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Abstract — The  detection  of  encrypted  stepping-stone  attack 
is  considered.  Besides  encryption  and  padding,  the  attacker  is 
capable  of  inserting  chaff  packets  and  perturbing  packet  timing 
and  transmission  order.  Based  on  the  assumption  that  packet 
arrivals  form  renewal  processes,  and  a  pair  of  such  renewal 
processes  is  also  renewal,  a  nonparametric  detector  is  proposed 
to  detect  attacking  traffic  by  testing  the  correlation  between 
interarrival  times  in  the  incoming  process  and  the  outgoing 
process.  The  detector  requires  no  knowledge  of  the  interarrival 
distributions,  and  it  is  shown  to  have  exponentially  decaying 
detection  error  probabilities  for  all  distributions.  The  error  expo¬ 
nents  are  characterized  using  the  Vapnik-Chervonenkis  Theory. 
An  efficient  algorithm  is  proposed  based  on  the  detector  structure 
to  detect  renewal  processes  with  linearly  correlated  interarrival 
times.  It  is  shown  that  the  proposed  algorithm  is  robust  against  an 
amount  of  chaff  arbitrarily  close  to  the  amount  of  chaff  needed 
to  mimic  independent  processes. 

Keywords /Intrusion  detection.  Stepping-stone  attacks,  Statisti¬ 
cal  Learning  Theory,  Nonparametric  detection. 


I.  Introduction 

Stepping-stone  attack  is  a  common  way  of  launching  anony¬ 
mous  attacks  [  1  ] .  In  such  an  attack,  the  attacker  routes  attack¬ 
ing  packets  to  the  victim  through  a  chain  of  compromised 
hosts  called  “stepping  stones”.  The  victim  only  sees  the  last 
stepping  stone,  and  thus  the  attacker’s  identity  is  concealed. 
The  difficulty  in  defending  against  such  attacks  lies  in  the  trac¬ 
ing  of  the  attacking  path,  and  the  tracing  can  be  decomposed 
into  detecting  every  pair  of  stepping-stone  connections  on  the 
intrusion  path. 

A  sophisticated  attacker  can  modify  the  attacking  traffic  to 
thwart  detection.  In  particular,  he  can  encrypt  and  pad  the 
packets  so  that  no  information  is  revealed  by  the  bit  patterns 
or  the  lengths  of  packets;  the  only  information  available  to  the 
detector  is  the  timing  of  the  traffic.  The  timing,  however,  is 
subject  to  changes  introduced  by  the  attacker  such  as  random 
delay  and  packet  reshuffling.  Furthermore,  the  attacker  can  mix 
attacking  traffic  with  chaff — dummy  traffic  generated  purely 
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for  the  purpose  of  evading  detection.  Chaff  traffic  can  be 
generated  arbitrarily,  and  it  does  not  need  to  reach  the  victim. 

In  this  paper,  we  consider  the  problem  of  detecting  en¬ 
crypted  stepping-stone  connections  in  the  presence  of  chaff. 
We  allow  the  attacker  to  use  various  evasion  strategies  includ¬ 
ing  encryption,  padding,  changing  the  packet  order  and  timing, 
and  mixing  attacking  packets  with  chaff.  Our  goal  is  to  develop 
techniques  that  are  robust  against  the  presence  of  chaff,  forcing 
the  attacker  to  spend  a  substantial  amount  of  time  transmitting 
chaff.  Such  robust  techniques  coupled  with  constrains  on  rates 
may  be  one  way  to  minimize  the  effectiveness  of  the  attacker. 

A.  Related  Work 

Ever  since  Stamford  and  Heberlein  [1]  first  consider  the 
problem  of  detecting  stepping-stone  connections,  there  has 
been  a  continuous  evolution  of  detection  techniques  as  well 
as  evasion  strategies.  Early  content-based  detection  techniques 
such  as  [1],  [2]  are  easily  defeated  by  encryption  and  padding. 
Timing-based  detection  considered  in  [3]— [5]  is  not  affected 
by  encryption  or  padding,  but  is  vulnerable  to  active  timing 
perturbation  introduced  by  the  attacker. 

Donoho  et  al.  [6]  first  consider  the  randomly  delayed 
stepping-stone  connections,  and  since  then  a  number  of  ad¬ 
vances  have  been  made  in  detecting  encrypted,  transformed 
stepping-stone  connections;  see  [6] — [8].  The  key  assumption 
of  these  methods  is  that  there  is  a  limit  on  the  attacker’s 
ability  to  alter  the  traffic.  For  example,  Donoho  et  al.  [6]  show 
that  in  principle  it  is  possible  to  detect  transformed  Poisson 
processes  if  the  transformation  satisfies  a  bounded  delay. 
Wang  and  Reeves  in  [7]  propose  to  correlate  relayed  streams 
with  independent  and  identically  distributed,  order-preserving 
perturbation  by  introducing  watermarks  into  packet  interar¬ 
rival  times.  Blum  et  al.  [8]  present  an  algorithm  “DETECT- 
ATTACKS”  (DA),  the  first  passive  detection  algorithm  with 
guaranteed  performance  based  on  the  assumption  of  bounded 
delay  and  bounded  peak  rate. 

When  chaff  can  be  inserted  to  evade  detection,  many 
previous  algorithms  fail.  Blum  et  al.  [8]  propose  an  algorithm 
called  “DETECT-ATTACKS-CHAFF”  (DAC)  modified  from 
their  algorithm  DA  to  deal  with  limited  chaff.  Algorithm  DAC 
tolerates  a  fixed  number  of  chaff  packets  by  sacrificing  the 
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false  alarm  probability,  but  a  pair  of  arbitrarily  long  streams 
can  still  evade  detection  by  adding  a  constant  number  of  chaff 
packets.  Peng  et  al.  [9]  and  Zhang  et  al.  [10]  separately 
propose  packet-matching  schemes  for  robust  detection;  it, 
however,  turns  out  that  these  schemes  can  not  deal  with  chaff 
packets  in  the  incoming  process  at  all. 

All  of  the  schemes  in  [8]— [  10]  can  be  defeated  by  a  constant 
number  of  chaff  packets.  As  the  traffic  size  increases,  the 
fraction  of  chaff  will  go  to  zero.  In  terms  of  rate,  zero  rate  chaff 
traffic  suffices  to  evade  their  detection.  The  only  algorithms 
that  are  known  to  handle  chaff  traffic  of  non-zero  rate  are 
algorithms  “DETECT-BOUNDED-DELAY-CHAFF”  (DBDC) 
and  “DETECT-BOUNDED-MEMORY-CHAFF”  (DBMC)  in 
[11],  Algorithm  DBDC  can  detect  traffic  flows  with  up  to 
1/(1  +  AA)  fraction  of  chaff  (where  A  is  a  design  parameter) 
if  packet  delays  are  bounded  by  A.  Algorithm  DBMC  is 
designed  for  detecting  traffic  flows  through  a  host  which  can 
hold  at  most  M  packets,  and  is  robust  against  up  to  1/(1  +  M) 
fraction  of  chaff.  The  drawback  of  DBDC  and  DBMC  is  that 
the  false  alarm  probabilities,  although  shown  to  go  to  zero 
eventually,  can  be  large  for  finite  sample  size.  In  this  paper, 
we  want  to  answer  the  question  whether  it  is  possible  to  reduce 
the  false  alarm  probability  by  allowing  certain  miss  detection. 


design  the  detector  threshold  to  satisfy  prescribed  performance 
specifications.  The  proposed  detector  is  optimal  under  the 
renewal  assumption  in  the  sense  that  if  the  attacking  packets 
satisfy  the  bounded  memory  or  bounded  delay  constraint,  then 
the  amount  of  chaff  needed  to  evade  detection  is  proportional 
to  the  traffic  size,  and  the  proportion  can  be  arbitrarily  close 
to  what  is  needed  to  mimic  truly  independent  processes. 
An  algorithm  is  proposed  to  efficiently  implement  the  de¬ 
tector;  it  reduces  the  computation  complexity  from  0(n6)  to 
0(?r2log?i)  where  n  is  the  sample  size. 

The  rest  of  the  paper  is  organized  as  follows.  Section  II 
defines  the  problem.  Section  III  presents  a  nonparametric 
detector  to  deal  with  chaff,  and  analyzes  its  performance.  The 
section  also  presents  an  efficient  algorithm  to  implement  the 
detector.  Section  IV  compares  the  robustness  of  the  proposed 
detector  with  that  of  existing  stepping-stone  detectors.  Sec¬ 
tion  V  simulates  the  proposed  detector  for  pairs  of  renewal 
processes  with  bivariate  exponential  interarrival  distributions. 
Then  Section  VI  concludes  the  paper  with  comments  on  some 
practical  issues  about  the  application  of  such  a  detector. 


II.  Problem  Definition 


B.  Summary  of  Results  and  Organization 


Denote  the  packet  arrivals  on  stream  i  as  a  point  process 


We  consider  robust  detection  of  stepping-stone  connections 
in  the  presence  of  chaff.  To  the  best  of  our  knowledge,  no 
existing  detector  has  provable  decay  rate  in  the  probabilities 
of  both  false  alarm  and  miss  detection.  The  main  contribution 
of  this  paper  is  a  quantitive  characterization  of  both  false 
alarm  and  miss  probabilities  by  imposing  the  assumption 
that  pairs  of  interarrival  times  in  the  incoming  and  outgoing 
processes  are  independent  and  identically  distributed  (i.i.d.  ). 
The  i.i.d.  assumption  is  a  limiting  assumption  in  the  sense  that 
even  if  i.i.d.  perturbation  is  applied  to  a  renewal  process,  the 
generated  pair  of  processes  may  not  have  i.i.d.  interarrivals; 
it  is,  however,  general  enough  to  include  a  wide  range  of 
relayed  processes  because  we  do  not  assume  the  processes 
to  satisfy  any  other  statistical  property.  The  stepping-stone 
detector  should  therefore  be  nonparametric. 


o.  _  („(*)  „(*) 

—  1*1  )  *2  ! 


i  =  1,  2,... 


where  s ( k  >  1)  is  the  kth  arrival  epoch  in  stream  i.  Let  %  = 
Sj  >  •  •  ■}  be  the  set  of  the  elements  in  Si.  Let  (S\,  Sf) 
be  a  pair  of  incoming  and  outgoing  streams  of  interest  at  a 
particular  gateway  node.  Normally,  Si  and  ,SA  are  independent. 
If,  however,  S/  is  a  relayed  stream  of  Si,  then  they  will  satisfy 
certain  relations. 


Definition  2.1:  A  pair  of  streams  (Si,  S2)  is  a  normal  pair 
if  Si  and  S2  are  independent  point  processes.  It  is  a  stepping- 
stone  pair  if  there  exists  a  bijection  g  :  T\  — >  7A  such 
that  g(s)  —  s  >  0  for  any  s  €  7],  and  g  satisfies  certain 
communication  requirements. 


We  propose  a  nonparametric  detector  to  detect  renewal 
processes  with  correlated  interarrival  times  based  on  the 
assumption  that  the  pair  formed  by  these  renewal  processes 
is  also  renewal,  i.e.,  the  pairs  of  interarrival  times  from  the 
incoming  and  outgoing  processes  are  i.i.d.  ;  the  detector  does 
not  assume  the  knowledge  of  the  interarrival  distributions.  This 
detector  applies  to  general  attacking  traffic  with  or  without 
memory  or  delay  constraints.  We  show  that  the  probabilities 
of  miss  detection  and  false  alarm  both  decay  exponentially 
with  the  number  of  packets  used  in  the  detection.  Explicit 
expressions  of  the  error  exponents  are  given  using  the  Vapnik- 
Chervonenkis  (VC)  Theory.  Such  expressions  allow  us  to 


The  bijection  g,  unknown  to  the  detector,  is  a  mapping 
between  the  arrival  and  the  departure  epochs  of  the  same 
packets,  allowing  permutation  of  packets  during  the  relay.  The 
condition  that  g  is  a  bijection  imposes  a  packet  conservation 
constraint,  i.e.,  no  attacking  packets  are  generated  or  dropped 
at  the  stepping  stones.  The  condition  g(s)—s  >  0  is  the  causal¬ 
ity  constraint,  which  means  that  an  attacking  packet  cannot 
leave  a  host  before  it  arrives.  Communication  requirements 
are  due  to  the  need  of  the  attacker’s  application,  the  physical 
constraints  of  the  relay  host,  or  the  communication  channel. 
Examples  include,  but  are  not  limited  to,  bounded  memory 
constraint  and  bounded  delay  constraint;  see  [11], 


If  S,  (%  =  1.  2)  is  the  mixture  of  attacking  packets  and  chaff, 
then  the  requirements  are  relaxed,  as  stated  in  the  following 
definition. 

Definition  2.2:  A  pair  of  streams  (S'i,  S2)  is  a  stepping- 
stone  pair  with  chaff  if  it  is  the  superposition  of  a  stepping- 
stone  pair  (S'i,  S'2)  and  a  pair  of  arbitrary  streams  (C 1,  C'2)1. 

Stream  6',  (1  =1,2)  consists  of  dummy  packets  called  chaff 
which  do  not  need  to  arrive  at  the  destination.  Chaff  packets 
can  be  generated  or  dropped  at  any  stepping  stones  without 
affecting  the  attack. 


aim  at  deriving  a  detector  to  test  the  statistical  correlation  be¬ 
tween  processes.  It  is  desirable  that  the  detector  has  guaranteed 
performance  for  a  wide  range  of  traffic. 

In  this  section,  we  present  a  nonparametric  detector  based 
on  the  statistical  learning  theory  for  the  hypothesis  testing 
problem  defined  in  (1).  In  Section  III- A,  we  introduce  a 
distance  measure,  called  A-di  stance,  between  probability  dis¬ 
tributions,  and  define  a  detector  based  on  ^-distance.  We  then 
address  the  computation  issues  in  Section  III-B,  where  an 
efficient  algorithm  is  proposed  to  reduce  the  complexity  in 
implementing  the  ^4-distance  detector. 


Let  the  interarrival  times  of  S’i  be  Xi,  X2,  ■ . where 
Xi  =  s?\  and  X,  =  1  —  ( i  >  1).  Similarly,  denote  the 

interarrival  times  of  S2  by  Y\ ,  Y2, . . ..  If  all  the  transmissions 
in  the  network  follow  renewal  processes,  then  X,’s  and  Yf  s  are 
i.i.d.  ,  respectively.  The  problem  is  that  without  any  constraint 
on  stepping-stone  pairs,  2,...  and  (Yj)j=1;  2,...  may 

correlate  arbitrarily;  in  general,  samples  of  the  pairs  (X,,  Yt) 
( i  =  1,  2,...)  are  not  sufficient  for  detection  because  the 
order  in  which  these  samples  are  taken  are  also  relevant.  The 
hypothesis  testing  will  have  the  form  of 

Ho  :  P(Xn,  Y")  =  P(X”)P(Y"), 

Hi  :  P(Xn,  Y")  ^  P(X")P(Y"), 

for  any  X",  Y”  £  R+n.  For  arbitrary  stepping-stone  pairs, 
the  worst  case  complexity  grows  exponentially  with  the  sample 
size.  If,  however,  the  stepping-stone  pairs  are  renewal  as  well, 
i.e.,  the  pairs  (X,,  Yf)  (i  =  1,  2,...)  are  i.i.d.  ,  then  the 
detection  is  reduced  to  a  testing  of  the  following  single -lettered 
hypotheses2: 

Ho  :  Pxy  =  Px  0  Py,  Hi  :  Pxy  ^  Px  0  Py,  (1) 

given  realizations  of  ((Xi,  Yi),  (X2,  Y2),...).  This  is  a 
nonparametric  hypothesis  testing  problem;  no  specific  assump¬ 
tions  on  the  distribution  Pxy  are  imposed. 


A.  Distance  Measure  and  Detector 


To  test  Ho  against  Hi.  we  need  to  measure  the  distance 
between  probability  distributions.  In  a  parametric  framework, 
the  conventional  distance  measure  is  the  Kullback-Leibler 
distance  [12],  Under  the  nonparametric  framework,  however, 
the  Kullback-Leibler  distance  cannot  be  easily  replaced  by  its 
finite  sample  counterpart3  We  solve  this  problem  by  using  the 
following  pseudo  distance  measure  from  [13]: 


Definition  3.1  (A-distance  and  empirical  A-distance): 
Given  probability  spaces4  (X,  T ,  Pt  )  ( i  =  1,  2)  and  a 
collection  of  sets  A  C  T.  the  A-distance  between  Pi  and  P2 
is  defined  as 


dA(Pi,P2)  =  sup  \PffA)  -  P2(A)\. 
AgA 


Given  two  collections  of  samples  S'i ,  S2  drawn  independently 
and  i.i.d.  from  Pi,  P2  respectively,  the  empirical  A-distance 
dA(Si,S2)  is  similarly  defined  by  replacing  PfiA)  with  the 
empirical  probability 


Si(A)= 


mm 

\Si\ 


where  |S*  G  A\  is  the  number  of  samples  from  S,  that  are  in 
III.  Nonparametric  Detection  of  Renewal  Traffic  the  set  A. 


Donoho  et  al.  in  [6]  have  noticed  that  for  renewal  processes, 
local  timing  perturbation  or  reshuffling  will  not  destroy  the 
correlation  between  processes.  Furthermore,  they  show  that 
nonzero  correlation  can  be  obtained  even  if  the  attacker  inserts 
chaff  independent  of  the  attacking  traffic.  Although  Donoho 
et  al.  do  not  derive  specific  stepping-stone  detectors  in  [6], 
their  work  shows  that,  in  principle,  effective  detection  can 
be  achieved  in  the  presence  of  chaff.  Inspired  by  Donoho 
et  al.  [6],  we  propose  an  alternative  to  existing  algorithmic 
approaches  that  check  strict  memory  or  delay  constraints.  We 

1  Note  that  Ci  and  C2  may  not  have  equal  length,  and  either  of  them  can 
be  empty. 

2We  use  Px  o  Py  to  denote  the  joint  probability  distribution  for  (X,  Y) 

in  which  X  and  Y  are  independent  with  marginals  Px  and  Py-  respectively. 


We  see  that  dA(Si,  S2)  £  [0,  1],  By  Vapnik-Chervonenkis 
Inequality  [14],  it  is  shown  [15]  that  dA(Si,  S2)  can  be 
arbitrarily  close  to  dA(Pi,  Pz)  as  sample  size  goes  to  infinity. 

Given  samples  S  =  {(x*,  t/.;)}™=1,  let  Sx  —  {x,;}"=1,  and 
Sy  =  With  the  distance  measure  defined,  we  now 

specify  the  detector  as  follows: 

Definition  3.2:  Let  A  be  a  collection  of  measurable  subsets 
of  [0,  c»)  x  [0,  00).  Given  e  £  (0,  1),  the  detector  using  A- 

3For  example,  it  can  be  shown  that  for  continuous  distribution,  the  empirical 
Kullback-Leibler  distance  is  infinite  almost  surely. 

4We  use  the  convention  that  X  is  the  sample  space,  T  the  cr-field,  and  Pi 
the  probability  measure. 


distance  measure  to  test  the  hypotheses  in  (1)  is  defined  as5 


SdAS,  e) 


1  if  d^(Sx  o  Sy,  Sxy )  >  e, 

0  o.w.. 


B(k,  1)’ s,  and  the  computation  for  each  B(k,  l)  takes  0(n 2) 
time.  By  proper  updating,  however,  we  can  reduce  the  com¬ 
plexity  to  0(n2  logn)  as  shown  in  the  algorithm  below. 


where  cIa(Sx0Sy,  Sxy )  is  the  empirical  „4-distance  between 
S x  o  Sy  and  Sxy,  defined  as 

dA^Sx  °  Sy,  Sxy)  =  sup  | Sx  °  Sy(A)  —  5xy(-4)| , 
AgA 

with  Sx  o  Sy(A)=\{Sx  x  SY)  D  A\/\S\2,  and 
SXY(A)=\SnA\/\S\. 

The  definition  involves  calculating  the  supremum  over  a 
possibly  infinite  collection  of  sets.  The  computation  of  the 
statistics  will  be  addressed  in  Section  III-B. 


B.  Efficient  Computation  of  Test  Statistics 
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Here  we  address  the  issue  of  computing  the  test  statistics 
dA(S. x  0  SY,  SXY )  defined  in  Definition  3.2.  We  give  an 
algorithm  to  compute  dA(Sx  °  Sy,  Sxy)  efficiently  for  the 
class  of  bands  tilted  to  a  certain  angle. 

Consider  A  as  the  class  of  bands  tilted  to  45°  with  respect  to 
the  rr-axis,  i.e.,  A  £  A  is  of  the  form  {(a:,  y)  :  y  —  x  €  [a,  &]} 
for  some  a  <  b.  The  rationale  for  this  choice  of  A  is  that  in 

n 

stepping-stone  connections,  the  nth  arrival  ^  Xi  and  the  nth 

i= 1 
n 

departure  Yi  will  not  diverge  unboundedly,  so  we  expect 

Xi  rj  Yi.  Thus  the  samples  of  interarrival  pairs  from  stepping- 
stone  traffic  often  cluster  around  the  unit  line  x  =  y  with 
some  noise;  bands  around  the  unit  line  can  reveal  significant 
difference  between  normal  traffic  and  stepping-stone  traffic. 


Fig.  1.  Example:  n  =  3.  •:  sample;  o:  “product  sample”;  B( 2,  6):  the  45° 
band  between  s'2  and  Sg. 

1)  SEARCH-TILTED-BANDS  (STB):  Algorithm  STB  im¬ 
plements  the  ^4-distance  detector  for  the  class  of  45°  bands 
efficiently.  Define 

A\(SxxSY)nB(i,  k) I  |SfTB(i,  k) | 

1  j  |S|2  \s\ 

for  k  =  1,  •  •  • ,  n2,  and  F(0)  =  0.  Then  we  have  that 
dA(Sx°SY,  SXY)  =  max  \F(l)  -  F(k)\ 

0  <k<l<n2 

=  max  F(k)  —  min  F(k).( 2) 

0 <k<n2  0 <k<n2 

Algorithm  STB  computes  dA(Sx  °  Sy,  Sxy)  by  computing 
F(k)  efficiently.  The  algorithm  is  shown  in  Table  I. 


Given  a  set  of  samples  S  =  {(xi,  2/i)}”=1,  the  “product 
samples”  Sx  x  Sy  are  the  set  of  the  n2  points  {(xi,  yj )}”  )=1. 
Sort  the  “product  samples”  into  (.s'j ,  s'2,  ■  ■  ■ ,  s'  2),  where  s'k  = 

(x'k,y'k),  such  that  x'1—y[  <x'2—y'2  < _ Geometrically,  this 

sorting  allows  us  to  scan  the  “product  samples”  in  the  order 
they  cross  the  45°  line  as  it  moves  from  northwest  to  southeast, 
as  illustrated  in  Fig.  1.  Let  B(k,  l)  ( k  <  l )  be  the  45°  band 
with  boundaries  passing  through  s'k  and  s[  respectively  (e.g., 
B( 2,  6)  in  Fig.  1),  i.e., 

B(k,  l)={{x,  y):  x-y€  [x'k  -  y'k,  x\  -  y[]}. 

We  have  that 


dA(Sx  0  SY,  Sxy) 


max 

l<k<l<n 


| (5x  X  Sy)  n  B(k,  l)\  |5  n  B(k,  l)\ 


|5|2 


\S\ 


For  S |  =  n,  an  exhaustive  search  to  compute  dA(Sx  ° 
Sy,  Sxy )  will  take  0(n6)  time,  since  there  are  0(n4) 


5We  use  the  convention  that  the  detector  gives  the  value  1  for  'H  \ .  and  0 
for  Hq. 


TABLE  I 

SEARCH-TILTED-BANDS  (STB) 


SEARCH-TILTED-BANDS(S,  e): 


for  i,  j  =  1  :  n 

D((j  -  1  )n  +  i)  =  Xi  -  yj ; 

end; 


[D,I]  =  sort  (D): 

Fm  in  =  Fmax  =  Ft  0)  =  0; 


for  k  =  1  :  n2 

Ftk)  =  (  F(k  -  V  +  ~ 

H  )  \  F(k-  1)+^ 

-^min  =  min(Fmin,  F(k)): 

Fm„  =  ma x(Fmax,  F(k)); 


n 


end 

if  -Fmax  —  i^min  >  €  return  ATTACK; 
else  return  NORMAL; 


if  I(k)  mod  (n  +  1)  ==  1, 
o.w. 


'  In  STB,  I  is  an  index  array  where  I(k)  is  the  index  of 
the  kth  smallest  entry  in  D.  If  sk  is  the  “product  sample” 
corresponding  to  D(I(k)),  we  have  that 


F(k) 


F(k- 1)  +  £-£  if^G5, 

F(k  -  1)  +  ^ 


O.W. 


Note  that  s'k  €  S'  if  and  only  if  I(k)  =  (i  —  1  )n  +  i  = 
(i  —  l)(n+l)  +  l  for  some  i  £  {1, . . . ,  n};  therefore,  s'k  £  S  is 
equivalent  to  I(k )  mod  (n+1)  ==  1.  Thus,  STB  can  compute 
F(k)  (k  =  1, . . . ,  n2)  by  an  0(n2)  updating.  The  sorting  of 
D  is  the  most  time-consuming  step,  and  it  takes  0(n2  log  n). 
Therefore,  STB  implements  the  ^4-distance  detector  for  the 
class  of  45°  bands  in  0(n2  logn)  time. 

We  point  out  that  STB  can  be  easily  modified  to  detect  other 
forms  of  linear  correlation  by  changing  the  order  in  scanning 
the  “product  samples”. 

IV.  Performance  of  ^-distance  Detector 

We  now  analyze  the  performance  of  SdA.  We  show  that 
it  has  exponentially  decaying  error  probabilities  on  both  false 
alarm  and  miss  detection.  We  derive  uniform  upper  bounds  on 
the  error  probabilities  by  applying  the  Vapnik-Chervonenkis 
Theory.  It  is  desirable  that  the  detector  is  robust  against  the 
insertion  of  chaff.  We  characterize  the  robustness  of  SdA  by 
deriving  the  minimum  chaff  required  to  have  nonzero  miss 
probability. 

A.  Error  Probabilities 

In  this  section,  we  characterize  the  error  probabilities  of  the 
detector  SdA  as  a  function  of  the  sample  size  n,  the  threshold 
value  e,  and  the  searching  class  A.  It  is  known  that  each 
class  of  measurable  sets  is  associated  with  a  positive  integer 
called  Vapnik-Chervonenkis  dimension  (VC-dimension)  which 
measures  the  complexity  of  the  class  [14],  For  a  collection  A 
with  finite  VC-dimension,  we  derive  the  following  exponential 
upper  bounds  on  the  error  probabilities  of  S,iA . 

Theorem  4.1:  Let  S  =  {( Xi ,  y,:)}"=1  be  drawn  i.i.d.  from 
Pxy,  and  A  have  finite  VC-dimension  d.  Then  for  arbitrary 
distribution  Pxy,  the  false  alarm  probability  of  d,jA  satisfies 

Pp(SdA)  <  8(2n  +  l)de-"e2/32. 

Moreover,  if  cIa(Px0Py ,  Pxy )  >  L  then  the  miss  probability 
satisfies 

Pm  ($dA  )  <  8(2  n  +  l)de~n(dA(PxoPY,  PXY)-ef/32 

Proof:  See  Appendix.  ■ 

Remark:  Theorem  4.1  provides  uniform  upper  bounds  on 
the  error  probabilities  of  SdA .  It  guarantees  that  under  any 
distribution,  ddA  can  perform  arbitrarily  well  with  sufficiently 
large  samples  (note  that  a  condition  needs  to  be  satisfied 
for  diminishing  miss  probability).  The  error  exponent  for 
false  alarm  probability  increases  with  e,  whereas  that  for 
miss  probability  decreases  with  e.  Therefore,  the  threshold  e 
represents  a  tradeoff  between  false  alarm  and  miss  detection. 


B.  Robustness  Against  Chaff 

It  is  shown  in  [16]  that  it  is  possible  for  the  attacker  to 
evade  any  detector  by  inserting  sufficient  chaff.  There  is, 
however,  a  limit  on  the  minimum  amount  of  chaff  needed 
to  do  so.  Specifically,  it  is  shown  in  [16]  that  the  minimum 
asymptotic  fraction  of  chaff  needed  to  mimic  independent 
Poisson  processes  of  rates  no  more  than  A  is  1/(1  +  AA) 
for  attacking  traffic  with  bounded  delay  A,  and  1/(1  +  M) 
for  attacking  traffic  through  a  host  with  bounded  memory  M. 
This  minimum  fraction  gives  fundamental  limit  on  the  amount 
of  chaff  that  any  detector  can  handle. 

In  this  section,  we  will  show  that  the  A-di  stance  detector 
can  achieve  robustness  arbitrarily  close  to  the  fundamental 
limit  for  a  class  of  joint  distributions  called  the  bivariate 
exponential  distribution,  derived  by  Marshall  and  Olkin  in 
[17],  A  pair  of  nonnegative  random  variables  (A,  Y)  satisfies 
the  bivariate  exponential  distribution  BVE(Ai,  A2,  A12)  if  its 
distribution  function  is  given  by 

Pr{A  >  s,Y  >t}  =  e-XlS~X2t-Xl2  max(s>  s,  t  >  0.  (3) 

The  importance  of  this  definition  of  bivariate  exponential 
distribution  is  that  it  preserves  the  memoryless  property  of 
the  univariate  exponential  distribution. 

For  the  bivariate  exponential  distribution  defined  above,  we 
characterize  the  amount  of  chaff  required  to  evade  the  A- 
distance  detector  in  the  following  theorem. 

Theorem  4.2:  Suppose  we  use  the  A-distance  detector  with 
threshold  e  £  (0,  1)  and  A  being  the  class  of  45°  bands.  If 
(Si,  S2)  is  a  stepping-stone  pair  in  which  the  pairs  of  interar¬ 
rival  times  (Xi,  Yf)  (i  =  1,  2, . . .)  have  i.i.d.  bivariate  expo¬ 
nential  distribution,  and  the  rates  of  ,S)  and  S2  are  bounded  by 
A,  then  the  minimum  fraction  of  chaff  to  have  nonzero  miss 
probability  is  lower  bounded  by  (1  —  e)/(l+  M)  for  stepping- 
stone  pairs  with  bounded  memory  M,  and  (1  —  e) / (1  +  AA) 
for  stepping-stone  pairs  with  bounded  delay  A. 

Proof:  See  Appendix.  ■ 

Remark:  Theorem  4.2  says  that  the  Al-distance  detector  can 
detect  any  correlation  in  bivariate  exponential  distribution. 
By  Theorem  4.1,  we  see  that  by  increasing  sample  size,  e 
can  be  made  arbitrarily  close  to  0  while  keeping  the  false 
alarm  probability  bounded  by  certain  level.  Therefore,  for  long 
connections,  the  robustness  of  the  A-di  stance  detector  can  be 
arbitrarily  close  to  the  optimal. 

For  the  attacker,  the  actual  value  of  e  may  be  unknown. 
Then  the  attacker  is  faced  with  a  tradeoff  between  the  amount 
of  chaff  and  the  level  of  protection;  he  can  save  100e%  of 
chaff  by  taking  the  risk  of  having  e  correlation. 


V.  Simulation 


We  implement  the  ^4-distance  detector  using  STB  to  verify 
the  performance.  We  let  Pxy  be  the  bivariate  exponential 
distribution  debited  in  Section  IV-B.  It  is  shown  in  [17]  that  the 
correlation  coefficient  p  between  bivariate  exponential  random 
variables  X  and  Y  is 

P  =  Ai2/(Al  +  A2  +  A12), 

where  A,  (1  =  1,  2,  12)  are  parameters  in  the  debnition  (3). 
We  will  test  the  performance  of  the  ,4-distance  detector  on 
processes  with  bivariate  exponentially  distributed  interarrival 
times  of  various  correlation  levels.  In  practice,  this  corresponds 
to  the  case  when  attacking  packets  arrive  according  to  a 
Poisson  process  of  rate  A12,  and  are  relayed  immediately 
without  delay,  but  the  attacker  inserts  chaff  packets  according 
to  independent  Poisson  processes  of  rates  Ai  and  A2  in  the 
incoming  and  outgoing  streams,  respectively. 


Before  starting  the  simulation,  we  have  to  solve  a  couple  of 
implementation  problems.  The  hrst  problem  is  how  to  decide 
the  detection  threshold  e.  In  the  Neyman-Pearson  framework, 
we  want  to  set  the  threshold  to  the  smallest  possible  value  as 
long  as  the  false  alarm  probability  is  bounded  by  a  prescribed 
value  a  £  (0,  1).  A  common  way  of  setting  threshold 
in  nonparametric  detection  is  to  use  training.  Training  is 
computation  intensive.  Furthermore,  the  training  data  is  not 
guaranteed  to  represent  all  the  normal  traffic  in  a  network  with 
many  different  traffic  types.  We  propose  to  set  the  threshold 
by  making  the  false  alarm  upper  bound  in  Theorem  4. 1  equal 
to  a.  Then  we  can  write  the  threshold  as 


e(n) 


a 

8(2n  +  \)d ' 


where  d  is  the  VC-dimension  of  A.  Theorem  4.1  guarantees 
that  the  false  alarm  probability  will  be  bounded  by  a  under 
arbitrary  interarrival  distributions.  For  the  class  of  45°  bands, 
it  is  easy  to  show  by  the  method  of  Wenocur  and  Dudley  [18] 
that  d  =  2. 


Next,  we  need  to  choose  the  sample  size.  Since  the  threshold 
e(n)  is  conservative,  the  detector  often  needs  a  large  number 
of  samples  to  have  reasonably  small  miss  probability.  We 
need  a  guideline  on  approximately  how  many  samples  are 
needed  to  have  reasonable  detection  performance.  We  use  the 
results  in  Theorem  4.1  to  estimate  the  minimum  sample  size. 
In  Theorem  4.1,  it  is  proved  that  the  miss  probability  decays 
exponentially  fast  if  dA{PxoPy ,  Pxy )  >  f •  Thus  we  estimate 
the  minimum  sample  size  as  the  smallest  integer  n  that  satisfies 
e(n)  <  dA{Px  o  PY,  PXy). 


In  our  simulation,  we  let  a  =  0.1,  and  vary  the  correlation 
p  among  0.85,  0.90,  0.95,  and  0.99.  The  simulated  miss 
detection  probabilities  of  STB  are  plotted  in  Fig.  2.  We  see 
that  there  is  a  critical  sample  size  beyond  which  the  miss 


probability  quickly  drops  from  1  to  close  to  0,  and  this  critical 
sample  size  decreases  as  the  correlation  value  increases.  For 
pi  =  0.85,  P2  =  0.90,  P3  =  0.95,  and  p4  =  0.99,  our  estimates 
of  the  minimum  sample  sizes  are  n\  =  854,  ri2  =  752,  713  = 
666,  and  714  =  607  respectively  (see  Fig.  2).  We  see  that  our 
estimates  agree  with  the  simulation  curves  very  well. 


Miss  Detection  vs.  Sample  Size  a  =  0.1 


Fig.  2.  Simulated  miss  detection  probabilities  of  STB:  a  =  0.1;  10000 
Monte  Carlo  runs;  pi  ( i  =  1, . . . ,  4):  the  correlation  between  Xi  and  Y{; 
rii\  the  estimated  minimum  sample  size  for  pi. 


VI.  Conclusion 

In  this  paper,  we  have  developed  a  nonparametric  method  to 
detect  stepping-stone  traffic  by  correlating  the  time  intervals 
between  packet  arrivals.  We  point  out  that  the  i.i.d.  assumption 
on  pairs  of  interarrival  times  is  crucial  for  the  proposed 
detector  to  work.  It  means  that  not  only  do  the  processes  in 
consideration  need  to  be  renewal  marginally,  but  their  pair 
has  to  be  renewal  as  well.  In  practice,  this  detector  should 
be  combined  with  a  preprocessor  to  filter  out  the  non-renewal 
processes. 


VII.  Appendix 
A.  Proof  of  Theorem  4. 1 


The  proof  uses  results  derived  from  the  Vapnik- 
Chervonenkis  Theory.  In  [15],  we  have  proved  that  for  ar¬ 
bitrary  distribution  P,  if  5  is  a  collection  of  n  i.i.d.  samples 
drawn  from  P,  and  A  is  a  class  of  measurable  sets  with  VC- 
dimension  d,  then 

Pr {dA(S,  P)>e}<  4(2n+  l)de“ne2/8, 


(4) 


where  dA(S,  P)  is  the  Al-distance  between  the  empirical 
distribution  according  to  S  and  P.  Applying  (4),  we  have 

Pr{dA(SXY,  PXY)  >  e}  <  4(2 n  +  l)de~n^'\ 

(5) 

P*{dA(SX  o  SY,  Px  o  Py)  >  4  <  4(2n  +  l)de~nt2/8. 

(6) 

Now  we  are  ready  to  bound  the  error  probabilities.  Since 
dA(-,-)  satisfies  triangle  inequality,  we  have 

cIa{Sx  °  Sy,  Sxy)  <  dA(Px  °  Py ,  Pxy) 

+dA{Sx  °  Sy,  Px  °  Py) 
+dA{SXY,  Pxy ),  (7) 

dA(Sx  °  Sy,  Sxy)  >  (Ia(Px  0  Py,  Pxy) 

—dA{Sx  0  Sy,  Px  °  Py) 
-dA{SxY,  Pxy)-  (8) 

Under  H0,  dA(Px  o  PY,  Pxy)  =  0.  Thus,  by  (7), 

Pf(Scia)  =  Pr{d^(S'.Y  o  Sy,  Sxy)  >  4 

<  Pr{d^(S'.Y  o  Sy,  Px  °  Py) 

+dA(SxY,  Pxy)  >  4  (9) 

<  Pr {dA(Sx  °  Sy,  Px  °  Py)  > 

+  Pr{^yt(<S'jfY,  Pxy)  >  ^ } 

<  8(2n+  l)de~ne2/32,  (10) 

where  (10)  is  obtained  by  plugging  in  (5,6). 

Under  Hi,  if  Pxy  satisfies  the  condition 
dA(Px  0  Py,  Pxy)  >  e,  then,  by  (8),  we  have 

Pm(8cIa)  =  P *{dA{Sx  °  Sy,  Sxy)  <  4 

<  Pr{d^(S,Y  o  Sy,  Px  °  Py)  +  dA(SXy,  Pxy) 
>  dA(Px  o  Py,  Pxy)  -  4- 

Following  the  same  derivation  as  after  (9)  yields 

PM(SdA)  <  8(2 n  +  l)de-n(dA(PxoPY,PXY)-e)2/32 


B.  Proof  of  Theorem  4.2 

By  Theorem  4.1,  we  see  that  to  have  non-vanishing  miss 
probability,  the  attacker  has  to  make  dA{Px  oPY ,  Pxy)  <  e. 
If  Pxy  is  the  bivariate  exponential  distribution  (BVE)  defined 
in  Section  IV-B  with  correlation  p ,  then  it  is  shown  in  [17]  that 
Pxy  satisfies  Px  y  ( A  =  Y)  =  p.  For  A  being  the  class  of 
45°  bands,  we  have  that  dA(Px°pY,  Pxy)  =  Pxy{X  =  Y). 


Thus  to  evade  the  A-distance  detector,  the  attacker  needs  to 
mimic  Poisson  processes  with  correlation  p  <  e. 

In  [8],  Blum  et  al.  present  an  optimal  algorithm  called 
“BOUNDED-GREEDY-MATCH”  (BGM)  to  embed  traffic 
with  bounded  delay  into  a  pair  of  arbitrary  processes;  they 
show  that  BGM  is  optimal  in  that  it  always  inserts  the 
minimum  number  of  chaff  packets.  In  [16],  we  propose  another 
algorithm  called  “BOUNDED-MEMORY-RELAY”  (BMR), 
which  inserts  the  minimum  number  of  chaff  packets  in  embed¬ 
ding  traffic  through  a  host  with  bounded  memory  into  arbitrary 
processes.  Therefore,  the  best  way  of  making  attacking  traffic 
with  bounded  delay  or  memory  mimic  given  (Si,  S2)  is  to 
embed  packet  transmissions  by  BGM  or  BMR,  respectively. 

The  rest  of  the  proof  directly  follows  from  the  performance 
of  BGM  and  BMR.  It  is  shown  in  [16]  that  the  minimum 
fractions  of  chaff  inserted  by  BGM  and  BMR  into  a  pair 
of  independent  Poisson  processes  of  rate  bounded  by  A  are 
1/(1  +  AA)  and  1/(1  +  M),  respectively.  Furthermore,  for 
BVE  distributions,  it  is  shown  in  [17]  that  St  (i  =  1,  2)  can 
be  written  as  a  superposition  of  Poisson  processes  P%  and  P3, 
where  Pi,  Pi,  and  P>  are  independent,  with  rates  Ai,  A2,  and 
A12,  respectively.  If  we  embed  packets  into  (Pi,  P2)  by  BGM 
or  BMR  (assume  P3  does  not  contain  any  chaff),  we  obtain  a 
lower  bound  on  the  fraction  of  chaff  as  (1  —  p)/(l  +  AA)  and 
(1  —  p)/(  1  +  M).  Combining  these  results  with  the  constraint 
p  <  e  completes  the  proof. 
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